FAST INVERSE SQUARE ROOT 
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ABSTRACT. Computing reciprocal square roots is necessary in many 
applications, such as vector normalization in video games. Often, 
some loss of precision is acceptable for a large increase in speed. 
This note examines and improves a fast method found in source- 
code for several online libraries, and provides the ideas to derive 
similar methods for other functions.! 


1. INTRODUCTION 


Reading the math programming forum on www.gamedev.net |1], I 
ran across an interesting method to compute an inverse square root. 
The C code was essentially (my comments): 


float InvSqrt(float x) 


{ 

float xhalf = 0.5f*x; 

int i = *(int*)&x; // get bits for floating value 

i = 0x5f3759df - (i>>1); // gives initial guess yo 

x = *(float*) &i; // convert bits back to float 

x = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracy 
return x; 

} 


The interesting part was the constant 0x5£3759df: where did it 
come from and why does the code work? Some quick testing in Visual 
C++.NET [2] showed the code above to be roughly 4 times faster than 
the naive (float) (1.0/sqrt(x)), and the maximum relative error 
over all floating point numbers was 0.00175228, so the method seems 
very useful. Three immediate questions were: 1) how does it work, 2) 
can it be improved, and 3) what bit master designed such an incredible 
hack? 

A quick search on Google for 0x5£3759df returned several hits, 
the most relevant being a reference to a thread about this code on 
comp.graphics.algorithms from Jan 9, 2002 [3], and an (incorrect, 
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but close) explanation by D. Eberly [4]. However his explanation is illu- 
minating. Further digging found no correct explanation of this code. It 
appears in the sourcecode for Quake 3, written by legendary game pro- 
grammer John Carmack, but someone on the net (I cannot now find the 
reference) credited it to Gary Tarolli who was at Nvidia. Can anyone 
confirm authorship? It also appears in the Crystal Space sourcecode 
[5], the Titan Engine [6], and the Fast Code Library, although each 
seems to derive from Quake 3. 

The motivation to try such an algorithm is more clearly explained 
in Eberly [4], where he assumes the shift creates a linear interpolation 
to the inverse square root. Note there are several ways to speed up 
this code, but this note will not go into further optimizations. There 
are also faster methods, perhaps table lookup tricks, but most of them 
have less accuracy than this method. 

This note assumes PC architecture (32 bit floats and ints) or similar. 
In particular the floating point representation is IEEE 754-1985 [7]. 
This C code has been reported to be endian-neutral (supposedly tested 
it on a Macintosh). I have not verified this. Since the method works 
on 32 bit numbers it seems (surprisingly) endian-neutral. It is easy to 
extend the code to other situations and bit sizes (such as type double) 
using the ideas in this note. Anyway, on to the problems 1), 2), and 
3). 


2. BACKGROUND 


Floating point numbers are stored in the PC as 32 bit numbers; 


s E M | 
bit 31 | 30 — bits — 23 | 22 — bits — 0 | 


where s is a 1 bit sign (1 denotes negative), F is an 8 bit exponent, and 
M is a 23 bit mantissa. The exponent is biased by 127 to accommodate 
positive and negative exponents, and the mantissa does not store the 
leading 1, so think of M as a binary number with the decimal point to 
the left, thus M is a value in J = [0, 1). The represented value is 


g = (-1)*(1+ M)2*-%7 


These bits can be viewed as the floating point representation of a 
real number, or thinking only of bits, as an integer. Thus M will be 
considered a real number in J or as an integer, depending on context. 
M as areal number is M as an integer divided by 27%. 
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3. THE ALGORITHM 


The main idea is Newton approximation, and the magic constant 
is used to compute a good initial guess. Given a floating point value 
x > 0, we want to compute J Define f(y) = z — x. Then the value 
we seek is the positive root of f(x). Newton’s root finding method, 
given a suitable approximation yn to the root, gives a better one Yn+1 
using 

f (Yn) 
ee 7 


For the f(y) given, this simplifies to yn. = 5Yn(3 — xy?) which 
corresponds to the line of code x = x*(1.5f-xhalf*x*x) , where x is 
the initial guess, which hereafter will be called yo. 

The line of code i = Ox5£3759df - (i>>1) computes this initial 
guess yo, roughly by multiplying the exponent for x by —5, and then 
picking bits to minimize error. i>>1 is the shift right operator from 
C, which shifts the bits of an integer right one place, dropping the 
least significant bit, effectively dividing by 2. Eberly’s explanation was 
that this produced linear approximation, but is incorrect; we’ll see the 
guess is piecewise linear, and the function being approximated is not 
the same in all cases. However I would guess his explanation was the 
inspiration for creating this algorithm. 


4. THE MAGIC NUMBER(S) 


Thus we are left with finding an initial guess. Suppose we are given 
a floating point value x > 0, corresponding to the 32 bits 


0;E[M] 


as above. Let the exponent e = E — 127. Then x = (1+ M)2°, and the 
desired value y = a = sa treating e and M as real numbers, 
NOT as integers. For the general case we take a magic constant R, 
and analyze yo = R—(i>>1), where the subtraction is as integers, i is 
the bit representation of z, but we view yo as a real number. R is the 
integer representing a floating point value with bits 


0[ Ri | P| 


i.e., Rı in the exponent field and Rə in the mantissa field. When we 
shift i right one bit, we may shift an exponent bit into the mantissa 
field, so we will analyze two cases. 

For the rest of this note, for a real number a let |a] denote the 
integer less than or equal to a. 
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4.1. Exponent E Even. In this case, when we use i>>1, no bits from 
the exponent FE are shifted into the mantissa M, and |E/2| = E/2. 
The true exponent e = E — 127 is odd, say e = 2d + 1, d an integer. 
The bits representing the initial guess give the new exponent 


Rı — |E/2] = R,- E/2 
= pee 
2 
2d+1+12 
= Pre e 
2 
= R,-—64-d 


We require this to be positive. If it were negative the resulting sign 
bit would be 1, and the method fails to return a positive number. If this 
result is 0, then the mantissa part could not borrow from the exponent, 
which we will see below is necessary. Since this must be true for any 
even E € [0..254], we require Ry > 128. 

Since the exponent F is even, no bits from it shift into the mantissa 
under i>>1, so the new mantissa is Rə — | M/2] (as integers), assuming 
Rə > |M/2]. The initial guess is then 


Yo = (1 aie R — WE DOC ere oie 
= (1 + Rə — M/2)2®:71-4 
n (2 ae ORs = Aah eee 


where M/2 replaced |M/2], since the resulting error is at most 2774 
in the mantissa (as a real number), which is insignificant compared to 
other errors in the method. 

If Rə < M/2, then upon subtracting R—(i>>1) as integers, the bits 
in the mantissa field will have to borrow one from the exponent field 
(this is why we needed the new exponent positive above), thus dividing 
yo by two. The bits in the mantissa field will then be (1+ R2)—|M/2], 
which are still in J. Thus if Rp < M/2 


Yo = (1 a (1 mit Ro = Mio ote 
= (2 ae Ro a Mp geet 


where we write the exponent the same as in the non-carrying case. 
If we define 


Ro 2Rə— M : R > M/2 
M~) Ro=M/2 : R< M/2 
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then we can combine these two yo equations into one: 
w= 2+ pipes 


Note that Be is continuous, and differentiable except along R = 
M/2. Substituting e = 2d + 1, the value y = a we are trying to 
approximate is 


1 g-e/2 = 1 g—d-1/2 — 1 9—4 
VI+M J1+M J2/1+M 


The relative error 
of the method, is 


y—yo 
y 


, which is how we will measure the quality 


a R a = 
z T2 d (2+ 6p )2®! 192—d 


1 9Q-d 
J2/1+M 


which simplifies to |e9(M, R)|, 
E€o(M, R) =]— V2V1 + M(2 + Be 


Note that this no longer depends on d, and that M can be any value 
in J = [0,1). This formula is only valid if E is even, thus has a hidden 
dependence on E. 

Suppose we want Rə so that the relative error maxmez |€o| < 1/8. 
Then 


1 
gene 
> |eo(0, Re)| 


Š 1 — /2(2 + 2R,)2% 7192 
Hi |1 — (1 oh Roe | 


Expanding, 
—1/8 <1—(1+ R,)2% 1 < 1/8 
9 9 T 7 
> > 9Rı—190.5 > > 
8 ~ 8(1+ Rə) 8(1+R2) 16 


where the last step used the fact that Ro € I > (1+ Rə) € [1, 2). 
Taking log, and adding 190.5 gives 190.7 > Rı > 189.2, so Rı = 190 = 
Oxbe is the unique integer that has any chance of keeping the relative 
error below 0.125 for even E. In bit positions 24 through 30, this gives 
the leading (Oxbe>>1) = Ox5f part of the magic constant R (as well 
as requiring bit 23 to be 0). 
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Figure 1 Figure 2 


To illustrate, Figure 1 shows yo as a solid line and the function 
(2+2M)~'/?2~¢ needing approximated as a dashed line, for Rə = 0.43, 
d = 2. It is clear that yo is a nonlinear approximation; however, by 
being nonlinear, it actually fits better! Figure 2 shows the relative error 
leo(M, Rə)| for Rə € I and M € T. It is clear that the constant Rə cross 
section with the smallest maximal error is approximately Rə = 0.45 


4.2. Exponent E Odd. With the previous case under our belt, we 
analyze the harder case. The difference is that the odd bit from the 
exponent E shifts into the high bit of the mantissa from the code i>>1. 
This adds t to the real value of the shifted mantissa, which becomes 
LM/2] + 4, where the truncation is as integers and the addition is as 
real numbers. e = E — 127 = 2d is even. Similar to above the new 
exponent is 


2 

2d + 127 
- n- 
= Rı—63-d 


R,-|E£/2) = Ri- er 


This must be positive as in the previous case. 

Again write M/2 instead of |//2| for the reasons above. The new 
mantissa is R2—(M/2+1/2) as real numbers when Rọ > “", requiring 
no borrow from the exponent. Then 


yo = (1+ Ry —(M/2 + 1/2))20-63-9-127 
(1/2 + Re — M/2)2™—0-4 
= (2+ 4R — 2M)2™ -1-4 
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Again we choose the same exponent for yo as in the case when FE is 

even. If Ro < Meh the subtraction R—(i<<1) needs to borrow one 

from the (positive) exponent, which divides yo by 2, and the bits in the 

mantissa field are then 1+ Rə — (4 + 5) as real numbers, which is still 
M+1 


in I. So if Ry < == we get for the initial guess 


w = TAR a 
(3/2 + Rə = M/2)2®7191-4 
= (3 + 2R — M)2F17192-4 


Defining (similarly to R2) 


Ro 4Ry —2M : 2R > M+1 
MM =] 1+2R-M : 2R<M+1 


we can write both of these yo at once as 
m= (2+2 


Note that yR is continuous, and differentiable except along 2R = 
M +1, so yo is also. Using e = 2d, we want yo to approximate 


1 1 1 
A~e SEM JI+M 


Note this is not the same function we were approximating in the case 
E even; it is off by a factor of V2. 

The relative error (which we want to minimize) simplifies as above 
to |e,(M, R)|, where 


e\(M, R) = 1 — V1 + M(2 + yy?)2" 71? 


again independent of d (but with the same hidden dependence on E as 
above). 

Also as before, if we want the relative error maxyer |€1| < 1/8 for 
E odd, we can take M = 0 (or M = 1) and analyze, but since we 
want the same constant Rı to work for E even and for E odd, we 
take Rı = 190 from above for both cases. Note that this satisfies the 
earlier requirement that Rı > 128. So for the rest of this note assume 
R, = 190 = Oxbe and redefine the two error functions as 


éo(M, Ro) = 1—-V2V1+ M(2+4 Bi?) /4 
€1(M, Rə) 1—V1+ M(2+4 yf) /4 
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depending only on M and Ro, each taking values in I. At this point 
we thoroughly understand what is happening, and computer tests con- 
firm the accuracy of the model for the approximation step, so we have 
achieved goal 1). 


4.2.1. An Example. When x = 16, E = 4+ 127 = 131, M = 0, and 
a bit from the exponent shifts into the mantissa. So after the line 
= Ox5f£3759df - (i>>1), the initial guess yo is Ox3E7759DF. The 
new exponent is 190 — |131/2| = 125 = 0x7d, which corresponds 
to e = 125 — 127 = —2. The new mantissa needs to borrow a bit 
from the exponent, leaving e = —3, and is Oxb759df - 0x4000000 
= 0x7759DF, corresponding to M = 0.932430148. Thus yo = (1 + 
M)2-3 = 0.241553, which is fairly close to a = 0.25. 
4.3. Optimizing the Mantissa. We want the value of Ry € I that 
minimizes maxyer{|€o(M, Ro)|, |e1(, Rə)|}. Fix a value of Rə, and 
we will find the value(s) of M giving the maximal error. Since ¢9 and 
€, are continuous and piecewise differentiable, this M will occur at an 
endpoint of a piece or at critical point on a piece. 


4.3.1. Maximizing |eo|. Start with the endpoints for ¢9: M = 0, M = 
1, and Ry = M/2 (where 34? is not differentiable). When M = 0, 
Bi? = ORs Let 


1+R 
pe 


fil Re) = €o(0, Ro) = 1 — V2/1+4 0(2 + 2R,)/4 = V 


Similarly, when Rə = M/2, BR = 


14 2Rp 
2 


fo(Ro) = €0(2Re, Ro) =1— V2./1 + 2R2(2+0)/4=1- 


When M = 1 we need to consider the two cases for 8f? wo giving 


f3( Ro) = €o(1, R2) = 
E(1—2Rə) : R< 1/2 


Checking the critical points takes two cases. Assuming Rə > M/2, 
solving 5 ðo — () gives the critical point M = = Ro, giving 


fal R2) = £0 Ce, Ro) =1-(1+ =F)? )v3 
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When Rə < M/2, the critical point is M = (1+ R2). This can only 
happen if Rə € [0, +), so define 
0 : Ro > 1/2 
fs( Ro) = 


3/2 
€o(3(1+ Ro), Ro) = 1- HRL : R<1/2 


4.3.2. Maximizing |e,|. Similar analysis on ¢, yields 


E iho SS see 
fala) = 2100.8) = EVERT > R< 1/2 


1- vV2Rə : Rə > 1 
f(a) = ex(1, Ra) =f Piri. ¢- Reet 


At the point R = ot. aie = 2, but this can only occur when 
Rə > 1/2, so we get 


fe(Rs) = í e1(2Ro— 1, Ro) = 1- V2Ra > R> 1/2 
0: R< 1/2 
The critical points again need two cases. If Rə > M/2 + 1/2 the 
critical point is M = 21, This only occurs if Rə > L. Similarly, 
for Rə < MH we obtain the critical point M = ote which requires 
Rə < 5. So we define both at once with 


e1(224=, Ry) =1- (B+R) Rp 21/2 


fla) = { Paar ey =~ Vaan : R< 1/2 


Note all f; are continuous except fy. 


4.3.3. Combining Errors. So, for a given Ro, the above 9 functions give 
9 values, and the max of the absolute values is the max error for this 
Rə. So we now vary Rə € I to find the best one. 


0.5 0.08 
0.4 

0.06 
0.3 

0.04 
0.2 
0.1 0.02 

R2 R2 
0.2 0.4 0.6 0.8 1 0.42 0.44 0.46 0.48 0.5 


Figure 3 Figure 4 
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Figure 3 is a Mathematica [8] plot of the error functions |f;(R2)|. 
The max of these is lowest near Rə = 0.45; Figure 4 is a closer look. 
The numerical solver FindMinimum in Mathematica finds the common 
value of Rə that minimizes max; | f;|, returning 


To = 0.432744889959443195468521587014 


where this many digits are retained since this constant can be applied 
to any bitlength of floating point number: 64 bit doubles, 128 bit fu- 
ture formats, etc. See the conclusion section. Note ro is very close 
to the original value in the code which is approximately 0.432430148. 
ro corresponds to the integer Ry = |2ro + 0.5] = 0x37642f. At- 
taching Rı = Oxbe, shifted to Ox5f, the optimal constant becomes 
R = 0x5f£37642f. 


5. TESTING 


We test the analysis using the following function that only computes 
the linear approximation step (the Newton step is removed!). 


float InvSqrtLinear(float x, int magic) 


{ 

float xhalf = 0.5f*x; 

int i = *(int*)&x; // get bits for floating value 
i = magic - (i>>1); // gives initial guess yo 

x = *(float*)&i; // convert bits back to float 
return x; 

} 


We apply the function to the initial constant, the constant derived 
above, and the constant 0x5£375a86 (the reason will be explained be- 
low). Also, we test each constant with the Newton step performed 1 
and 2 iterations (the added time to run the second Newton step was 
very small, yet the accuracy was greatly increased). Each test is over 
all floating point values. The maximal relative error percentages are 


| Value Predicted | Tested | 1 Iteration | 2 Iterations 


| 0x5£3759df | 3.43758 | 3.43756 | 0.175228 | 4.66e-004 
| Ox5£37642f | 3.42128 | 3.42128 | 0.177585 | 4.77521e-004 
| 0x5£375a86 || 3.43655 | 3.43652 | 0.175124 | 4.65437e-004 


So the analysis was correct in predicting that the new constant would 
approximate better in practice. Yet surprisingly, after one Newton 
iteration, it has a higher maximal relative error. Which again raises 
the question: how was the original code constant derived? 

The reason the better approximation turned out worse must be in 
the Newton iteration. The new constant is clearly better in theory and 
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practice than the original for initial approximation, but after 1 or 2 
Newton steps the original constant performed better. Out of curiosity, 
I searched numerically for a better constant. The results of the theory 
implied any good approximations should be near those few above, thus 
limiting the range of the search. 

Starting at the initial constant, and testing all constants above and 
below until the maximal relative error exceeds 0.00176 gives the third 
constant 0x5f375a86 as the best one; each was tested over all floating 
point values. The table shows it has a smaller maximal relative error 
than the original one. So the final version I propose is 


float InvSqrt(float x) 


{ 

float xhalf = 0.5f*x; 

int i = *(int*)&x; // get bits for floating value 

i = 0x5f£375a86- (i>>1); // gives initial guess yo 

x = *(float*) &i; // convert bits back to float 

x = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracy 
return x; 

} 


Thus the initial goal 2) is reached, although not as directly as planned. 
Goal 3) remains open :) 
The C++ code and Mathematica 4.0 code are available online [9]. 


6. CONCLUSION AND OPEN QUESTIONS 


This note explains the “magic” behind the constant in the code, and 
attempts to understand it in order to improve it if possible. The new 
constant 0x5f£375a86 appears to perform slightly better than the origi- 
nal one. Since both are approximations, either works well in practice. I 
would like to find the original author if possible, and see if the method 
was derived or just guessed and tested. 

The utility of this note is explaining how to go about deriving such 
methods for other functions and platforms. With the reasoning meth- 
ods above, almost any such method can be analyzed. However the 
analysis should be thoroughly tested on a real machine, since hardware 
often does not play well with theory. 

The above derivations are almost size neutral, so can be applied to 64 
bit double, or even longer packed types to mimic SIMD instructions 
without hardware support. The analysis allows this method to be 
extended to many platforms and functions. 

For example, the double type has an 11 bit exponent, biased by 
1023, and a 52 bit mantissa. After deriving yo with the same form 
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for the mantissa as above, only the bound on R,; needs reworked, and 
the rest of the analysis will still hold. The same constant ro can used 
for any bit size in this manner. Here Rı = 1534, and the best initial 
approximation is R = R125? + |ro * 2°2| = Ox5fe6ec85e7de30da. A 
quick test shows the relative error to be around 0.0342128 for the initial 
approximation, and 0.0017758 after 1 Newton iteration. 

A final question is to analyze why the best initial approximation to 
the answer is not the best after one Newton iteration. 


6.1. Homework: A few problems to work on: 

1. Derive a similar method for sqrt (x) 

2. Which is better (faster? more accurate?): a version that works for double or 
2 Newton steps? 

3. Confirm the best initial approximation for 64 bit IEEE 754 size type double. 
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